NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improved Visual Grounding through Self-Consistent Explanations

https://doi.org/10.1109/CVPR52733.2024.01244

He, Ruozhen; Cascante-Bonilla, Paola; Yang, Ziyan; Berg, Alexander C; Ordonez, Vicente (June 2024, IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Similarity Search for Efficient Active Learning and Search of Rare Concepts

Coleman Cody; Chou, Edward; Katz-Samuels, Julian; Culatana, Sean; Bailis, Peter; Berg, Alexander C.; Nowak, Robert; Sumbaly, Roshan; Zaharia, Matei; Yalniz, I. Zeki (January 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. We evaluate several selection strategies in this setting on three large-scale computer vision datasets: ImageNet, OpenImages, and a de-identified and aggregated dataset of 10 billion publicly shared images provided by a large internet company. Our approach achieved similar mean average precision and recall as the traditional global approach while reducing the computational cost of selection by up to three orders of magnitude, enabling web-scale active learning.
more » « less
Full Text Available
Combining Multiple Cues for Visual Madlibs Question Answering

https://doi.org/10.1007/s11263-018-1096-0

Tommasi, Tatiana; Mallya, Arun; Plummer, Bryan; Lazebnik, Svetlana; Berg, Alexander C.; Berg, Tamara L. (April 2018, International Journal of Computer Vision)

This paper presents an approach for answering ﬁll-in-the-blank multiple choice questions from the Visual Madlibs dataset.Instead of generic and commonly used representations trained on the ImageNet classiﬁcation task, our approach employs acombination of networks trained for specialized tasks such as scene recognition, person activity classiﬁcation, and attributeprediction. We also present a method for localizing phrases from candidate answers in order to provide spatial support forfeature extraction. We map each of these features, together with candidate answers, to a joint embedding space throughnormalized canonical correlation analysis (nCCA). Finally, we solve an optimization problem to learn to combine scoresfrom nCCA models trained on multiple cues to select the best answer. Extensive experimental results show a signiﬁcantimprovement over the previous state of the art and conﬁrm that answering questions from a wide range of types beneﬁts fromexamining a variety of image cues and carefully choosing the spatial support for feature extraction.
more » « less
Full Text Available

Search for: All records